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gVMTpr.TTP PT.ANT flENF S AND MF.THOD FOR PREPARATION 
RACKCTOUND OF THE INVENTION 

5 

The present invention relates to genetic 
engineering and more particularly to plant 
transformation in which a plant is transformed to 
express a heterologous gene. 

10 Although great progress has been made in recent 

years with respect to transgenic plants which express 
foreign proteins such as herbicide resistant enzymes 
and viral coat proteins, very ^little is known about 
the major factors affecting expression of foreign 

15 genes in plants. Several potential factors could be 

responsible in varying degrees for the level of 

protein expression from a particular coding sequence. 

The level of a particular mRNA in the cell is 

certainly a critical factor. 

The potential causes of low steady state levels of 
20 • 

mRNA due to the nature of the coding sequence are 
" many/ First, full length RNA synthesis might not 
occur at a high frequency. This could, for example, 
be caused by the premature termination of RNA during 
transcription or due to unexpected mRNA processing 
during transcription. Second, full length RNA could 
be produced but then processed (splicing, polyA 
addition) in the nucleus in a fashion that creates a 
nonfunctional mRNA. If the RNA is properly 

synthesized, terminated and polyadenylated, it then 
30 can move to the cytoplasm for translation. In the 
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cytoplasm, mRNAs have distinct half lives that are 
determined by their sequences and by the cell type in 
which they are expressed. Some RNAs are very shcrt- 
5 lived and some are much more long-lived. In addticn, 
there is an effect, whose magnitude is uncertain, or 
translational efficiency on mRNA half-life. In 
addition, every RNA molecule folds into a particular 
structure, or perhaps family of sturctures, which is 
10 determined by its sequence. The particular structure 
of any RNA might lead to greater or lesser stability 
in the cytoplasm. Structure per se is probably also a 
determinant of mRNA processing in the nucleus. 
Unfortunately, it is impossible to predict, and nearly 
« impossible to determine, the structure of any RNA 
(except for tRNA) in vitro or in vivo. However, it xs 
likely that dramatically changing the sequence or an 
RNA will have a large effect on its folded structure. 
It is likely that structure per se or particular 
structural features also have a role in determining 

RNA stability. 

some particular sequences and signals have been, 
identified in RNAs that have the potential for having 
a specific effect on RNA stability. This section 
summarizes what is known about these sequences and 

25 signals. These identified sequences often are A + T 
rich, and thus are more likely to occur in an A + T rxcn 
coding sequence such as .B.C. gene. The sequence 
motif ATTTA (or AUUUA as it appears in RNA) has been 
implicated as a destabilizing sequence in mammalian 

30 cell mRNA (Shaw and Kamen, 1986) . No analysis of the 
function of this sequence in plants has been done. 
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Many short lived mRNAs have A+T rich 3' untranslated 
regions, and t+vese regions often have the ATTTA 
sequence, sometimes present in mutiple copies or as 
5 multimers (e.g., ATTTATTTA . . . ) . Shaw and Kamen showed 
that the transfer of the 3' end of an unstable mRNA to 
a stable RNA (globin or VA1) decreased the stable 
RNA's half life dramatically. They further showed 
that a pentamer of ATTTA had a profound destabilizing 
10 effect on a stable message, and that this signal could 
exert its effect whether it was located at the 3' end 
or within the coding sequence. However, the number of 
ATTTA sequences and/or the sequence context in which 
they occur also appear to be important in determining 
j5 whether they function as destabilizing sequences. 
Shaw and Kamen showed that a trimer of ATTTA had much 
less effect than a pentamer on mRNA stability and a 
dimer or a monomer had no effect on stability (Shaw 
and Kamen, 1987) . Note that multimers of ATTTA such 
as a pentamer automatically create an A+T rich region. 
20 This was shown to be a cytoplasmic effect, not 
""^nuclear. In other unstable mRNAs, the ATTTA sequence 
may be present in only a single copy, but it is often 
contained in an A+T rich region. From the animal cell 
data collected to date, it appears that ATTTA at least 
25 in some contexts is important in stability, but it is 
not yet possible to predict which occurences of ATTTA 
are destabiling elements or whether any of these 
effects are likely to be seen in plants. 

Some studies on mRNA degradation in animal cells 
*• 30 also indicate that RNA degradation may begin in some 

cases with, nucleolytic attack in A+T rich regions. It 
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is not clear if these cleavages occur at ATTTA 
sequences. There are also examples of mRNAs that have 
differential Stability depending on the cell type in 
which they are expressed or on the stage • within the 
cell cycle at which they are expressed. For example, 
histone mRNAs are stable during DMA synthesis but 
unstable if DNA synthesis is disrupted. The 3' end of 
some histone mRNAs seems to be responsible for this 
effect (Pandey and Marzluff, 1987). It does not 
appear to be mediated by ATTTA, nor is it clear what 
controls the differential stability of this mRNA. 
Another example is the differential stability of IgG 
mRNA in B lymphocytes during B cell maturation 
(Genovese and Milcarek, 1988) . A final example is the 
instability of a mutant beta-thallesemic globin mRNA. 
in bone marrow cells, where this gene is normally- 
expressed, the mutant mRNA is unstable, while the wrld- 
type mRNA is stable. When the mutant gene is 
expressed in HeLa or L cells in vitro, the mutant mRNA 
20 shows no instability (Lim et al., 1988). These 
"examples all provide evidence that mRNA stability can 
be mediated by cell type or cell cycle specific 
factors. Furthermore this type of instability is not 
yet associated with specific sequences. Given these 
25 uncertainties, it is not possible to predict which 
RNAs are likely to be unstable in a given cell. In 
addition, even the ATTTA motif may act differentially 
depending on the nature of the cell in which the RNA 
is present. Shaw and Kamen .(1987) have reported that 
30 activation of protein kinase C can block degradation 
mediated by ATTTA . 
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The addition ^ of a polyadeny late string to the 3' 
end is common to'most eucaryotic mRNAs, both plant and 
animal. The currently accepted view of polyA addition 
is that the nascent transcript extends beyond the 
mature 3* terminus. Contained within this transcript 
are signals for polyadenylat ion and proper 3' end 
formation. This processing at the 3' end involves 
cleavage of the mRNA and addition of polyA to the 
mature 3' end. By searching for consensus sequences 
near the polyA tract in both plant and animal mRNAs, 
it has been possible to identify consensus sequences 
that apparently are involved in polyA addition and 3' 
end cleavage. The same consensus sequences seem to be 
25 important to both of these processes. These signals 
are typically a variation on the sequence AATAAA. In 
animal cells, some variants of this sequence that are 
functional have been identified; in plant cells there 
seems to be an extended range of functional sequences 
(Wickens and Stephenson, 1984; Dean et al., 1986). 
-Because all of these consensus sequences are 
variations on AATAAA, they all are A+T rich sequences. 
This sequence is typically found 15 to 20 bp before 
the polyA tract in a mature mRNA. Experiments in 
animal cells indicate that this sequence is involved 
in both polyA addition and 3' maturation. site 
directed mutations in this sequence can disrupt these 
functions (Conway and Wickens, 1988; Wickens et al., 
1987). However, it has also been observed that 
sequences up to 50 to 100 bp 3 ' to the putative polyA 
signal are also required; i.e., a gene that has a 
normal AATAAA but has been replaced or disrupted 
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downstream does not get properly polyadeny lated (Gil 
and Proudfoot, 1984; Sadofsky and Alwine, 1984; 
McDevitt et al~\ 1984) . That is, the polyA signal 
itself is not sufficient for complete anc proper 
processing. It is not yet known what specific 
downstream sequences are required in addition to the 
polyA signal, or if there is a specific sequence that 
has this function. Therefore, sequence analysis can 

10 only identify potential polyA signals. 

In naturally occuring mRNAs that are normally 
polyadenylated, it has been observed that disruption 
of this process, either by altering the polyA signal 
or other sequences in the mRNA, profound effects can 

15 be obtained in the level of functional mRNA. This has 
been observed in several naturally occuring mRNAs, 
with results that are gene specific so far. There are 
no general rules that can be derived yet from the 
study of mutants of these natural genes, and no rules 
that can be applied to heterologous genes. 3elow are 

20 

four examples: 

- ^ - ... ^ i. in a globin gene, absence of a proper polyA- 
site leads to improper termination of transcription. 
It is likely, but not proven, that the improperly 
terminated RNA is nonfunctional and unstable 
25 (Proudfoot et al., 1987). 

2. In a globin gene, absence of a functional 
polyA signal can lead to a 100-fold decrease in the 
level of mRNA accumulation (Proudfoot et al., 1987). 

3. A globin gene polyA site was placed into the 
30 3* ends of two different histone genes. The histone 

genes contain a secondary structure (stem-loop) near 



their 3' ends. VJChe amount of properly polyadenyiatec 
histone mRNA produced from these chimeras decreased as 
the distance between the stem-loop and the polyA site 
increased. Also, the two histone genes produced 
greatly different levels of properly polyadenylatec 
mRNA. This suggests an interaction between the polyA 
site and other sequences on the mRNA that can modulate 
mRNA accumulation (Pandy and Marzluff, 1987) . 

4. The soybean leghemoglobin gene has been cloned 
into HeLa cells, and it has been determined that this 
plant gene contains a "cryptic" polyadeny lat ion signal 
that is active in animal cells, but is not utilized in 
plant cells. This leads to the production of a new 
polyadenylated mRNA that is nonfunctional. This again 
shows that analysis of a gene in one cell type cannot 
predict its behavior in alternative cell types 
(Wiebauer et al., 1988). 

From these examples, it is clear that in natural 
mRNAs proper polyadenylat ion is important in .t.RNA 
accumulation, and that disruption of this process can 
effect mRNA levels significantly. However, 
insufficient knowledge exists to predict the effect of 
changes in a normal gene. In a heterologous gene, 
where we do not know if the putative polyA sites 
(consensus sequences) are functional, it is even 
harder to predict the consequences. However, it is 
possible that the putative sites identified are 
disf unctional . That is, these sites may not act as 
proper polyA sites, but instead function as aberrant 
sites that give rise to unstable mRNAs . 
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In animal cell systems, AATAAA is by far the most 
common signal 'identified in mRNAs upstream of the 
polyA, but at least four variants have also been found 

5 (Wickens and Stephenson, 1984). In plants, not nearly 
so much analysis has been done, but it is clear that 
multiple sequences similar to AATAAA can be used. The 
plant sites below called major or minor refer only to 
the study of Dean et al . (1986) which analyzed only 

10 three types of plant gene. The designation of 
polyadenylation sites as major or minor refers only to 
the frequency of their occurrence as functional sites 
in naturally occurring genes that have been analyzed. 
In the case of plants this is a very limited database. 

15 It is hard to predict with any certainty that a site 
designated major or minor is more or less likely to 
function partially or completely when found in a 
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Another type of RNA processing that occurs in the 
nucleus is intron splicing. Nearly all of the work on 
intron processing has been done in animal cells, but 

10 some data is emerging from plants. Intron processing 
depends on proper 5 1 and 3* splice junction sequences. 
Consensus sequences for these junctions have been 
derived for both animal and plant mRNAs, but only a 
few nucleotides are known to be invariant. Therefore, 

15 it is hard to predict with any certainty whether a 
putative splice junction is functional or partially 
functional based solely on sequence analysis. In 
particular, the only invariant nucleotides are GT at 
the 5' end of the intron and AG at the 3 f end of the 
intron. In plants, at every nearby position, either 
within the intron or in the exon flanking the intron, 
all four nucleotides can be found, although some 
positions show some nucleotide preference (Brown, 
1955; Hanley and Schuler, 1988) . 



mutagenesis was performed to introduce new restriction 
sites, and this mutagenesis changed several 
nucleotides in the intron and exon sequences flanking 
the GT and AG. This intron still functioned properly, 
30 indicating the importance of the GT and AG and the 
flexibility at other nucleotide positons . There are 



A plant intron has been moved from a patatin gene 
into a GUS gene. To do this, site directed 
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of course many occurences of GT and AG in all genes 
that do not function as intron splice junctions, so 
there must be some other sequence or structrual 
features that identify splice junctions. In plants, 
one such feature appears to be base composition per 
se. Wiebauer et al . (1988) and Goodall et al. (1988) 
have analyzed plant introns and exons and found that 
exons have -50% A+T while introns have -70% A+T . 
Goodall et al . (1988) also created an artificial plant 
intron that has consensus 5 ! and 3 1 splice junctions 
and a random A+T rich internal sequence. This intron 
was spliced correctly in plants. When the internal 
segment was replaced by a G+C rich sequence, splicing 
15 efficiency was drastically reduced. These two 
examples demonsatrate that intron. recognition in 
plants may depend on very general features — splice 
junctions that have a great deal of sequence diversity 
and A+T richness of the intron itself. This, of 
course, makes it difficult to predict from sequence 
alone whether any particular sequence is likely to 
function as an active or partially active intron for 
RNA processing. 

B.t. genes being A+T rich contain numerous 
stretches of various lengths that have 70% or greater 
A+T. The number of such stretches identified by 
sequence analysis depends on the length of sequence 
scanned. 

As for polyadenylation described above, there are 
complications in predicting what sequences might be 
utilized as splice sites in any given gene. First, 
many naturally occuring genes have alternative 



20 



25 



30 



PCT/US90/00778 



10 



15 



20 



splicing pathways that create alternative combinations 
of exons in the finVl mRNA (Gallega and Nadal-Ginard, 
1988; Helfman and Ricci, 1988; Tsurushita and Korn, 
1989). That is, some splice junctions are apparently 
recognized under some circumstances or in certain cell 
types, but not in others. The rules governing this 
are not understood. In addition, there can be an 
interaction between processing paths such that 
utilization of a particular polyadenylation site can 
interfere with splicing at a nearby splice site and 
vice versa (Adami and Nevins, 1988; Brady and Wold, 
1988; Marzluff and Pandey, 1988). Again no predictive 
rules are available. Also, sequence changes in a gene 
can drastically alter the utilization of particular 
splice junctions. For example, in a bovine growth 
hormone gene, small deletions in an exon a few hundred 
bases downstream of an intron cause the splicing 
efficiency of the intron to drop from greater than 95% 
to less than 2% (essentially nonfunctional) . Other 
deletions however have essentially no effect (Hampson 
and Rottman, 1988) . Finally, a variety of in vitro 
and in vivo experiments indicate that mutations that 
disrupt normal splicing lead to rapid degradation of 
the RNA in the nucleus. Splicing is a multistep 
25 process in the nucleus and mutations in normal 
splicing can lead to blockades in the process at a 
variety of steps. Any of these blockades can then 
lead to an abnormal and unstable RNA. Studies of 
mutants of normally processed (polyadenylation and 
splicing) genes are relevant to the study of 
heterologous- genes such as B.C. B.C. genes might 
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contain functional signals that lead to the production 
of aberrant nonfunctional mRNAs, and these mRNAs are 
likely to be unstable. But the B.t. genes are perhaps 
even more likely to contain signals that are analogous 
to mutant signals in a natural gene. As shown above 
these mutant signals are very likely to cause defects 
in the processing pathways whose consequence is to 
produce unstable mRNAs. 

It is not known with any certainty what signals RNA 
transcription termination in plant or animal cells. 
Some studies on animal genes that indicate that 
stretches of sequence rich in T cause termination by 
calf thymus RNA polymerase II in vitro. These studies 
15 have shown that the 3' ends of in vitro terminated 
transcripts often lie within runs of T such as T5, T6 
or T7. Other identified sites have not been composed 
solely of T, but have had one or more other 
nucleotides as well. Termination has been found to 
2q occur within the sequences TATTTTTT, ATTCTC, TTCTT 
(Dedrick et al . , 1987/ Reines et al . , 1987). m the 
case of these latter two, the context in which the 
sequence is found has been C+T rich as well. It is 
not known if this is essential. Other studies have 
implicated stretches of A as potential transcriptional 
terminators. An interesting example from SV40 
illustrates the uncertainty in defining terminators 
based on sequence alone. One potential terminator in 
SV40 was identified as being A rich and having a 
region of dyad symmetry (potential stem-loop) 5' to 
30 the A rich stretch. However, a second terminator 
identified experimentally downstream in the same gene 
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was not A rich and included no potential secondary 
structure (Kessler et.al., 1988). Of course, due to 
the A+T content of B.C. genes, they are rich in runs 
5 of A or T that could act as terminators. The 
importance of termination to stability of the mRNA is 
shown by the globin gene example described above. 
Absence of a normal polyA site leads to a failure in 
proper termination with a consequent decrease in mRNA. 
10 There is also an effect on mRNA stability due the 

translation of the mRNA. Premature t ranslational 
termination in human triose phosphate isomerase leads 
to instability of the mRNA (Daar et al. f 1988). 
Another example is the beta-thallesemic globin mRNA 
described above that is specifically unstable in bone 
marrow cells (Lim et al., 1988). The defect in this 
mutant gene is a single base pair deletion at codon 44 
that leads to translational termination (a nonsense 
codon) at codon 60. Compared to properly translated 
normal globin mRNA, this mutant RNA is very unstable. 
These results indicate that an improperly translated 
mRNA is unstable . Other work in yeast indicates that 
proper but poor translation can have an effect on mRNA 
levels. A heterologous gene was modified to convert 
certain codons to more yeast preferred codons . An 
25 overall 10-fold increase in protein production was 
achieved, but there was also about a 3-fold increase 
in mRNA (jjoekema et al., 1987). This indicates that 
more efficient translation can lead to greater mRNA 
stability, and that the effect of codon usage can be 
30 at the RNA level as well as the translational level. 
It is not clear from codon usage studies which codons 
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lead to poor translation, or how this is coupled to 
mRNA stability. 

Therefore, it is an object of the present invention 

5 

to provide a method for preparing synthetic plant 
genes which express their respective proteins at 
relatively high levels when compared to wild-type 
genes. It is yet another object of the present 
invention to provide synthetic plant genes which 
10 express the crystal protein toxin of Bacillus 
thuringiensis at relatively high levels. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 Figure 1 illustrates the steps employed in 

modifying a wild-type gene to increase expression 
efficiency in plants. 

Figure 2 illustrates a comparison of the changes in 
the modified B.t.k. HD-1 sequence of Example 1 (lower 

2q line) versus the wild-type sequence of B.t.k. HD-1 
which encodes the crystal protein toxin (upper line) . 

Figure 3 illustrates a comparison of the changes in 
the synthetic B.t.k. HD-1 sequence of Example 2 (lower 
line) versus the wild-type sequence of B.t.k. HD-1 
which encodes the crystal protein toxin (upper line) 

25 

Figure 4 illustrates a comparison of the changes in 
the synthetic B.t.k. HD-73 sequence of Example 3 
(lower line) versus the wild-type sequence of B.t.k. 
HD-73 (upper line) . 

Figure 5 represents -.a plasmid map of intermediate 
30 plant transformation vector cassette pMON893. 
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